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(57) Abstract 

The present invention proposes a new method and apparatus for the enhancement of source coding systems. The invention employs 
bandwidth reduction (101) prior to or in the encoder (103), followed by spectral-band replication (105) at the decoder (107). This is 
accomplished by the use of new transposition methods, in combination with spectral envelope adjustments. Reduced bitrate at a given 
perceptual quality or an improved perceptual quality at a given bitrate is offered. The invention is preferably integrated in a hardware 
or software codec, but can also be implemented as a separate processor in combination with a codec. The invention offers substantial 
improvements practically independent of codec type and technological progress. 



FOR THE PURPOSES OF INFORMATION ONLY 



Codes used to identify States party to the PCT on the front pages of pamphlets publishing international applications under the PCT. 



AL Albania 

AM Armenia 

AT Austria 

AU Australia 

AZ Azerbaijan 

BA Bosnia and Herzegovina 

BB Barbados 

BE Belgium 

BF Burkina Faso 

BG Bulgaria 

BJ Benin 

BR Brazil 

BV Belarus 

CA Canada 

CF Central African Republic 

CG Congo 

CH Switzerland 

CI Cote d'lvoire 

CM Cameroon 

CN China 

CU Cuba 

CZ Czech Republic 

DE Germany 

DK Denmark 

EE Estonia 



ES 


Spain 


LS 


Lesotho 


SI 


FI 


Finland 


LT 


Lithuania 


SK 


FR 


France 


LU 


Luxembourg 


SN 


GA 


Gabon 


LV 


Latvia 


SZ 


GB 


United Kingdom 


MC 


Monaco 


TD 


GE 


Georgia 


MD 


Republic of Moldova 


TG 


GH 


Ghana 


MG 


Madagascar 


TJ 


GN 


Guinea 


MK 


The former Yugoslav 


TM 


GR 


Greece 




Republic of Macedonia 


TR 


HU 


Hungary 


ML 


Mali 


TT 


IE 


Ireland 


MN 


Mongolia 


UA 


IL 


Israel 


MR 


Mauritania 


UG 


IS 


Iceland 


MW 


Malawi 


US 


IT 


Italy 


MX 


Mexico 


uz 


JP 


Japan 


NE 


Niger 


VN 


KE 


Kenya 


NL 


Netherlands 


YU 


KG 


Kyrgyzstan 


NO 


Norway 


ZW 


KP 


Democratic People's 


NZ 


New Zealand 






Republic of Korea 


PL 


Poland 




KR 


Republic of Korea 


PT 


Portugal 




KZ 


Kazakstan 


RO 


Romania 




LC 


Saint Lucia 


RU 


Russian Federation 




U 


Liechtenstein 


SD 


Sudan 




LK 


Sri Lanka 


SE 


Sweden 




LR 


Liberia 


SG 


Singapore 





Slovenia 

Slovakia 

Senegal 

Swaziland 

Chad 

Togo 

Tajikistan 

Turkmenistan 

Turkey 

Trinidad and Tobago 

Ukraine 

Uganda 

United States of America 

Uzbekistan 

Viet Nam 

Yugoslavia 

Zimbabwe 



WO 98/57436 



PCT/IB98/00893 



SOURCE CODING ENHANCEMENT USING SPECTRAL-BAND REPLICATION 



10 



15 



20 



TECHNICAL FIELD 

In source coding systems, digital data is compressed before transmission or storage to reduce the required bitrate or 
storing capacity. The present invention relates to a new method and apparatus for the improvement of source coding 
systems by means of Spectral Band Replication (SBR). Substantial bitrate reduction is achieved while maintaining 
the same perceptual quality or conversely, an improvement in perceptual quality is achieved at a given bitrate. This 
is accomplished by means of spectral bandwidth reduction at the encoder side and subsequent spectral band 
replication at the decoder, whereby the invention exploits new concepts of signal redundancy in the spectral domain. 

BACKGROUND OF THE INVENTION 

Audio source coding techniques can be divided into two classes: natural audio coding and speech coding. Natural 
audio coding is commonly used for music or arbitrary signals at medium bitrates, and generally offers wide audio 
bandwidth. Speech coders are basically limited to speech reproduction but can on the other hand be used at very low 
bitrates, albeit with low audio bandwidth. Wideband speech offers a major subjective quality improvement over 
narrow band speech. Increasing the bandwidth not only improves intelligibility and naturalness of speech, but also 
facilitates speaker recognitioa Wideband speech coding is thus an important issue in next generation telephone 
systems. Further, due to the tremendous growth of the multimedia field, transmission of music and other non-speech 
signals at high quality over telephone systems is a desirable feature. 

A high-fidelity linear PCM signal is very inefficient in terms of bitrate versus the perceptual entropy. The CD 
standard dictates 44. 1 kHz sampling frequency, 16 bits per sample resolution and stereo. This equals a bitrate of 
1411 kbit/s. To drastically reduce the bitrate, source coding can be performed using split-band perceptual audio 
codecs. These natural audio codecs exploit perceptual irrelevancy and statistical redundancy in the signal. Using the 
best codec technology, approximately 90% data reduction can be achieved for a standard CD-format signal with 
practically no perceptible degradation. Very high sound quality in stereo is thus possible at around 96 kbit/s, i.e. a 
compression factor of approximately. 15:1. Some perceptual codecs offer even higher compression ratios. To 
achieve this, it is common to reduce the sample-rate and thus the audio bandwidth. It is also common to decrease the 
number of quantization levels, allowing occasionally audible quantization distortion, and to employ degradation of 
the stereo field, through intensity coding. Excessive use of such methods results in annoying perceptual degradation. 
Current codec technology is near saturation and further progress in coding gain is not expected. In order to improve 
the coding performance further, a new approach is necessary. 

The human voice and most musical instruments generate quasistationary signals that emerge from oscillating 
systems. According to Fourier theory, any periodic signal may be expressed as a sum of sinusoids with the 
frequencies/ If, 3f, 4/ 5/etc. where/is the fundamental frequency. The frequencies form a harmonic series A 
bandwidth limitation of such a signal is equivalent to a truncation of the harmonic series. Such a truncation alters the 
perce.ved timbre, tone colour, of a musical instrument or voice, and yields an audio signal mat will sound "muffled" 
or "dull", and intelligibility may be reduced. The high frequencies are thus important for the subjective impression 
40 of sound quality. 
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a codebook. This information is used to continuously adjust and equalise the replicated highband. The present SBR- 
1 method offers the advantage of post-processing, i.e. no modifications are needed at the encoder side. A 
broadcaster will gain in channel utilisation or will be able to offer improved perceptual quality or a combination of 
both. Existing bitstream syntax and standards can be used without modification. 

SBR-2, intended for the improvement of high quality codec applications, is a double-ended process where, in 
addition to the transmitted lowband signal according to SBR-1, the spectral envelope of the highband is encoded and 
transmitted. Since the variations of the spectral envelope has a much lower rate than the highband signal 
components, only a limited amount of information needs to be transmitted in order to successfully represent the 
spectral envelope. SBR-2 can be used to improve the performance of current codec technologies with no or minor 
modifications of existing syntax or protocols, and as a valuable tool for future codec development 



Both SBR-1 and SBR-2 can be used to replicate smaller passbands of the lowband when such bands are shut down 
by the encoder as stipulated by the psychoacoustic model under bit-starved conditions. This results in improvement 
15 of the perceptual quality by spectral replication within the lowband in addition to spectral replication outside the 
lowband. Further, SBR-1 and SBR-2 can also be used in codecs employing bitrate scalability, where the perceptual 
quality of the signal at the receiver varies depending on transmission channel conditions. This usually implies 
annoying variations of the audio bandwidth at the receiver. Under such conditions, the SBR methods can be used 
successfully in order to maintain a constantly high bandwidth, again improving the perceptual quality. 

20 

The present invention operates on a continuous basis, replicating any type of signal contents, i.e. tonal or non-tonal 
(noise-like and transient signals). In addition, the present spectral replication method creates a perceptually accurate 
replica of the discarded bands from available frequency bands at the decoder. Hence, the SBR method offers a 
substantially higher level of coding gain or perceptual quality improvement compared to prior art methods. The 
25 invention can be combined with such prior art codec improvement methods; however, no performance gain is 
expected due to such combinations. 



The SBR-method comprises the following steps: 

- encoding of a signal derived from an original signal, where frequency bands of the signal are discarded and 
30 me discarding is performed prior to or during encoding, forming a first signal, 

- during or after decoding of the first signal, transposing frequency bands of the first signal, forming a second 
signal, 

performing spectral envelope adjustment, and 

- combining the decoded signal and the second signal, forming an output signal. 
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The passbands of the second signal may be set not to overlap or partly overlap the passbands of the first signal, and 
may be set in dependence of the temporal characteristics of the original signal and/or the first signal, or transmission 
channel conditions. The spectral envelope adjustment is performed based on estimation of the original spectral 
envelope from said first signal or on transmitted envelope information of the original signal. 



WO 98/57436 



4 



PCT/IB98/00893 



m .present invention includes to basic types of transpose*: multiband transpose* and time-variant pattern search 
predion transpose*, having different properties. A basic multiband transposition may be performed according to 
the present invention by the following: 

- filtering the signal to be transposed through a set of N S 2 bandpass filters w.th passbands comprising the 
frequences [f x ,...M respectively, forming //bandpass signals, 

- shifting the bandpass signals in frequency to regions comprising the frequencies M [f\,...,f N ] where M * 1 is 
the transposition factor, and 

- combining the shifted bandpass signals, forming the transposed signal. 

Alternatively, this basic multiband transposition may be performed according to the invention by the following- 

- bandpass filtering the signal to be transposed signal using an analysis filterbank or transform of such a nature 
that real- or complex-valued subband signals of lowpass type are generated, 

- anarbitrary ""-ber of channels * of sa^^ ^ 
1, in a synthesis filterbank or transform, and 

- the transposed signal is formed using the synthesis filterbank or transform 

An improved multiband transposition according to the invention incorporates phase adjustments, enhancing the 
performance of the basic multiband transposition. 

Pattem search prediction transposkion accordin8 10 ** present invention may be p erformed * 

- performing transient detection on the first signal, 

- determining which segment of the first signal to be used when duplicating/discarding parts of the first s.gnal 
depending on the outcome of the transient detection, 

- adjusting statevector and codebook properties depending on the outcome of the transient detection and 
searching for synchronisation points in chosen segment of the first signal, based on the synchronisation point 
iound m the previous synchronisation point search. 

The SBR methods and apparatuses according to the present invention offer the following features- 

1. The methods and apparatuses exploitnew concepts of signal redundancy in the spectral domain 

2. The methods and apparatuses are applicable on arbitrary signals. 

3. Each harmonic set is individually created and controlled. 

4. All replicated harmonics are generated in such a manner as to form a continuation of the existing harmonic 

series 



5 • The spectral replication process is based on transposition and creates no or imperceptible artifacts 
6- The spectral replication can cover multiple smaller bands and/or a wide frequency range 

can be used without modification. 

8 " Zk^oT 0 * " imPlemented ^ aCC0rd3nCe ^ ^ St3ndardS ^ Pr ° t0C0,S With no or 

9. The SBR-2 method offers the codec designer a new powerful compression tool 

10. The coding gain is significant. 
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The most attractive application relates to the improvement of various types of low bitrate codecs, such as MPEG 1/2 
Layer I/II/III [U.S. Pat 5,040,217], MPEG 2/4 AAC, Dolby AC-2/3, NTT TwinVQ [U.S. Pat. 5,684,920], 
AT&T/Lucent PAC etc. The invention is also useful in high-quality speech codecs such as wide-band CELP and 
SB-ADPCM G.722 etc. to improve perceived quality. The above codecs are widely used in multimedia, in the 
5 telephone industiy, on the Internet as well as in professional applications. T-DAB (Terrestrial Digital Audio 

Broadcasting) systems use low bitrate protocols that will gain in channel utilisation by using the present method, or 
improve quality in FM and AM DAB. Satellite S-DAB will gain considerably, due to the excessive system costs 
involved, by using the present method to increase the number of programme channels in the DAB multiplex. 
Furthermore, for the first time, full bandwidth audio real-time streaming over the Internet is achievable using low 
1 0 bitrate telephone modems. 

BRIEF DESCRIPTION OF THE DRAWINGS 

The present invention will now be described by way of illustrative examples, not limiting the scope or spirit of the 
invention, with reference to the accompanying drawings, in which: 
15 Fig. 1 illustrates SBR incorporated in a coding system according to the present invention; 

Fig. 2 illustrates spectral replication of upper harmonics according to the present invention; 

Fig. 3 illustrates spectral replication of inband harmonics according to the present invention; 

Fig. 4 is a block diagram for a time-domain implementation of a transposer according to the present invention; 

Fig. 5 is a flow-chart representing a cycle of operation for the pattern-search prediction transposer according to the 
20 present invention; 

Fig. 6 is a flow-chart representing the search for synchronisation point according to the present invention; 

Fig. 7a - 7b illustrates the codebook positioning during transients according to the present invention; 

Fig, 8 is a block diagram for an implementation of several time-domain transposes in connection with a suitable 

filterbank, for SBR operation according to the present invention; 
25 Fig. 9a - 9c are block diagrams representing a device for STFT analysis and synthesis configured for generation 

of 2 nd order harmonics according to the present invention; 

Fig. 10a - 10b are block diagrams of one sub-band with a linear frequency shift in the STFT device according to 
the present invention; 

Fig. 1 1 shows one sub-band using a phase-multiplier according to the present invention; 
30 Fig. 12 illustrates how 3 rd order harmonics are generated according to the present invention; 

Fig. 13 illustrates how 2 nd and 3 rd order harmonics are generated simultaneously according to the present 
invention; 

Fig. 14 illustrates generation of a non-overlapping combination of several harmonic orders according to the 
present invention; 

35 Fig. 15 illustrates generation of an interleaved combination of several harmonic orders according to the present 
invention; 

Fig. 16 illustrates generation of broadband linear frequency shifts; 
Fig. 17 illustrates how sub-harmonics are generated according to the present invention; 
Fig. 18a - 18b are block diagrams of a perceptual codec; 
40 Fig. 19 shows a basic structure of a maximally decimated filterbank; 
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Fig. 2 :l is a block diagram for the improved multiband ^Pos.tion ,„ a ™al, y decided fil.erbank 
operating on subband signals according to the present invention; 
5 Fig. 22 is a flowchart representing the unproved multiband transposition in a maximally decimated fl.terbank 
operating on subband signals according to the present invention; 
Fig. 23 illustrates subband samples and scalefactors of a typical codec 

f, 6 . , ^ ^ of cnve]ope jn sbr ; . 

10 ^«.te«te s ^ mtoC yc 0 di„ gi «SBR.2ac»^„g,„ lhe p rrara , I , vellBoo 
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where /Vis the number of sinusoids, hereafter referred to as partials f „ ,k „• , 
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representation *(/) is bandlimited to the range 0 iof maXi 201. The signal contents in the ranged IM to Qf^/Af, 
where Q is the desired bandwidth expansion factor 1< Q zm, is extracted by means of a bandpass filter, forming a 
bandpass signal with spectrum^/) 203. The bandpass signal is transposed a factor M t forming a second 
bandpass signal with spectrum covering the range/ mar to Qf^ 205. The spectral envelope of this signal is 
5 adjusted by means of a programme-controlled equaliser, forming a signal with spectrum X E (f) 207. This signal is 
then combined with a delayed version of the input signal in order to compensate for the delay imposed by the 
bandpass filter and transposes whereby an output signal with spectrum Y(f) covering the range 0 to Qf^ is formed 
209. Alternatively, bandpass filtering may be performed after the transposition M, using cut-off frequencies /^ and 
Of™*- By using multiple transposes, simultaneous generation of different harmonic orders is of course possible. 
10 The above scheme may also be used to "fill in" stopbands within the input signal, as shown in Fig. 3, where the 

input signal lias a stopband extending from/ 0 to Qf 0 301. A passband [fo/M.Qfo/M] is then extracted 303, transposed 
a factor M to [f 0 t Qf 0 ] 305, envelope adjusted 307 and combined with the delayed input signal forming the output 
signal with spectrum Y(f) 309. 

15 An approximation of an exact transposition may be used. According to the present invention, the quality of such 
approximations is detenriined using dissonance theory. A criterion for dissonance is presented by Plomp ["Tonal 
Consonance and Critical Bandwidth" R. Plomp, W. J. M Levelt JASA , Vol 38, 1965], and states that two partials 
are considered dissonant if the frequency difference is within approximately 5 to 50% of the bandwidth of the 
critical band in which the partials are situated. For reference, the critical bandwidth for a given frequency can be 

20 approximated by 

c6(/) = 25 + 75(l + 1.4( T ^) 2 ) 069 (3) 

with/and cb in Hz. Further, Plomp states that the human auditory system can not oUscrirrunate two partials if they 
differ in frequency by approximately less than five percent of the critical bandwidth in which they are situated. The 
exact transposition in Eq. 2 is approximated by 

N-\ 

25 yapprax(") = 2>,(«) COS(2^(M/ ± A/)A7 I f 5 +/?,) , (4) 

where A/ is the deviation from the exact transposition. If the input partials form a harmonic series, a hypothesis of 
the invention states that the deviations from the harmonic series of the transposed partials must not exceed five 
percent of the critical bandwidth in which they are situated. This would explain why prior art methods give 
unsatisfactory "harsh" and "rough" results, since broad band linear frequency shifts yields a much larger deviation 
30 than acceptable. When prior art methods produce more than one partial for only one input partial, the partials must 
nevertheless be within the above stated deviation limit, as to be perceived as one partial. This again explains the 
poor results obtained with prior art methods using nonlinearities etc, since they produce intermodulation partials not 
within the limit of deviation. 



35 



When using the above transposition based method of spectral replication according to the present invention, the 
following important properties are achieved: 
- Normally, no frequency domain overlap occur between replicated harmonics and existing partials. 
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annoying dissonance or artifacts. 

- THe spectral envelope of the replied harmonics forms a smooth continuation of the input signal spectral 
envelope, perceptually matching the original envelope. 

Transpos ition based on time-variant p att ern search p rediri inn 

Various ways to design the required tracers exist. Typical time-domain implementations expand the signal in 
tune by duplicating signal segments based on the pitch-period. This signal is subsequently read out at a different 
rate. Unfortunately such methods are strictly dependent on pitch-detection for accurate time splicing of the signal 
segments. Furthennore, the constraint to work on pitch-period based signal segments makes them sensitive to 
transits. Since the detected pitch-period can be much longer than the actual transient, the risk of duplicating the 
entire transient rather than just expanding it in time is obvious. Another type of time domain algorithms obtains time 
expans.on/compression of speech signals by utilising pattern search pred.ction of the output signal ("Pattern Search 
Predion of Speech" R. Bogner, T. Li, Proc. ICASSP '89, Vol. 1, May 1989, "Tune-Scale Modification of Speech 
based on a nonlinear Oscillator Model" G. Kubin, W. B. Kleijn, IEEE, 1994]. Tlus is a form of granular synthesis 
where the input signal is divided into small parts, granules, used to synthesise the output signal. This synthesis is ' 

that the segments used to form the output signaJ are not dependent on the pitch period, and thus the non-trivial task 
of prtch detection is not required. Nevertheless, problems with rapidly changing signal amplitudes remain in these 
methods, and high quality transposition tends to raise high computational demands. However, an improved time- 
domain pnch shifter/transposer is now presented, where the use of transient detection and dynamic system 
parameters produces a more accurate transposition for high transposition factors during both stationary (tonal or 
non-tonal) and transient sounds, at a low computational cost. 

Referring to the drawings wherein like numerals indicate like elements, there is shown in Fig 4 nine separate 
modules: a transient-detector 401, a window position adjuster 403, a codebook generator 405, a synchronisation 
srgnal selector 407, a synchronisation position memory 409, a minimum difference estimator 41 1 an output 
segment memory 413, a mix unit 4,5, and a down sampler 417. The input signal is fed to both the codebook 

modu, e 403. Thrs module stipulates the size and position of the window that is multiplied with the input signal when 

407, provided « has been connected to another transposes If this synchronisation position is within the codebook it 
used and an output segment is produced. Otherwise the codebook is sent to the rnimmum difference estimator 4 1 , 
whrch returns a new synchronisation position. Tne new output segment is windowed together with the previous 
output segment in the mix module 415 and subsequently down sampled in module 417. 

In order to clarify the explanation, a state space representation ,s introduced. Here, the state vectors or granules 
represent the mput and output signals. The input signal is represented by a statevector x(n): 

*(") = N"), x{n - D), x{n - 2D) x(n -{N- 1)D)] (5) 
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which is obtained from M delayed samples of the input signal, where N is the dimension of the state vector and D is 
the delay between the input samples used to build the vector. The granular mapping yields the sample x(n) following 
each statevector x(n-l). This gives Eq. 6, where a(.) is the mapping: 

x(/7) = a(x(*-l)). (6) 

In the present method the granular mapping is used to determine the next output based on the former output, using a 
state transition codebook. The codebook of length L is continuously rebuilt containing the statevectors and the next 
sample following each statevector. Each statevector is separated from its neighbour by K samples; this enables the 
system to adjust the time resolution depending on the characteristics of the currently processed signal, where K 
equal to one represents the finest resolution. The input signal segment used to build the codebook is chosen based on 
the position of a possible transient and the synchronisation position in the previous codebook. 



15 



20 



This means that the mapping a(.), theoretically, is evaluated for all transitions included in the codebook: 





L) 




L + K) 


x(n- 


1) 



x(n-L + \) 
x(n-L + K+\) 

x(n) 



(7) 



With this transition codebook, the new output y(n) is calculated by searching for the statevector in the codebook 
most similar to the current statevector y(n-l). This nearest-neighbour search is done by calculating the minimum 
difference and gives the new output sample: 

y(n) = a(y(n-\)). (8) 

However, the system is not limited to work on a sample by sample basis, but is preferably operated on a segment by 
segment basis. The new output segment is windowed and added, mixed, with the previous output segment, and 
subsequently down sampled. The pitch transposition factor is determined by the ratio of the input segment length 
represented by the codebook and the output segment length read out of the codebook. 



Returning to the drawings, in Fig. 5 and Fig. 6 flowcharts are presented, displaying the cycle of operation of the 
transposer. In 501 the input data is represented, a transient detection 503 is performed on a segment of the input 

25 signal; the search for transients is performed on a segment length equal to the output segment length. If a transient is 
found 505, the position of the transient is stored 507 and the parameters L (representing the codebook length), K 
(representing the distance in samples between each statevector), and D (representing the delay between samples in 
each statevector) are adjusted 509. The position of the transient is compared to the position of the previous output 
segment 5 1 1, in order to determine whether the transient has been processed. If so 5 13, the position of the codebook 

3 0 (window L), and the parameters K, L, and D are adjusted 515. After the necessary parameter adjustments, based on 
the outcome of the transient detection, the search for a new synchronisation, or splicing point takes place 517. This 
procedure is displayed in Fig. 6. First a new synchronisation point is calculated based on the previous 601, 
according to: 

Sync j>os = Sync _posj>ld + S-M - S, (9) 

3 5 where Syncjyos and Sync _pos_old are the new and old synchronisation positions respectively, S is the length of the 
input segment being processed, and M is the transposition factor. This synchronisation point is used to compare the 
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no,, , poin, is « for m ms . pB6niitd ^ 

~~«»»--P^..*«17 lta .^«, I , BtalI ^ 5 ^ 

« is «.red 5 ,9 and a new „ , s „ad m from „. ^ a , ^ J 
g,ven svncbrcnrsadon pota. This segmen, y and ^ M fc ^ ^ ^ 

transposition factor 525, and slored in Die output buffer 527. 

In Fig. 7 tbe bebaviou, of tbe svsteir, ^„ transient condhion, regardin6 „, e ^ „ f ^ 

Pno, odlet^tbecodebo* , representing „ , is posnioned "to the leT of seg^en, 

Corrdauon sclent , rep^ a par, of the previous ourpu, a* is used to find svoch^ion poiM , in 

synchronisation points prior to the transient. 

Most pitch .ransom o, dme e^anders, based on panem scan* predion give satisfactory results fo , speecl , ^ 
However, -F-^*^^,.,*^^^,,,^ 

egnrfalion du„„ B rapid* changing sigra, serenes. Funirermore, aberauon of.be lengd, o, the signal segment 
^refined cc.eboo.searcb based o^^sf^dupreeedin^b Tbis ^e^l,^ 

'»^^™°rthec.,npu,a,ion^ ""^ 

™ e ^°»™u»TOa s e>q,la^ , 

•he following, illusive bnl M amsl . b Fi „ , ,. SBR ' 0— -corirng ,. 

uuuung, example. In Fig. 8 three time expansion modules are used in order m 
8=n». 1 e s econd,fhirdarJf<^„rd=,l m ™nic s . Since u, this eaamole e,,* , „ 

wnrIfC _ . , , , ce * m 11118 exam Ple, each time domain expansion /transDoser 

works on a wdeband agna,, it is benefit to adjust the spectral enve.ope of the source frequel raJpZo 
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transposition, yields the desired spectral envelope. The transposes 807, 809 and 8 1 1 are interconnected in order to 
share synchronisation position information. This is based on the fact that under certain conditions, a high correlation 
will occur between die synchronisation positions found in the codebook during correlation in the separate 
transposing units. Assume, as an example and again not limiting the scope of the invention, the fourth order 
5 harmonic transposer works on a time frame basis half of that of the second order harmonic transposer but at twice 
the duty cycle. Assume further, that the codebooks used for the two expanders are the same and tliat die 
synchronisation positions of the two time-domain expanders are labelled sync jpos4 and sync _pos2, respectively. 
This yields the following relation: 

sync _ posl = sync _ posA - n • 4 • S - sync offset , for a?= 1 , 2,3 ,4 . . . , (10) 

10 where 

sync _offset = sync _pos4~ sync _pos2 , forn-0, (H) 

and S is the length of the input segment represented by the codebook. This is valid as long as neither of the 
synchronisation position pointers reaches the end of the codebook. During normal operation n is increased by one 
for each time-frame processed by the second order harmonic transposer, and when the codebook end inevitably is 
15 reached, by either of the pointers, the counter n is set to w=0, and sync jposl and sync _pos4 are computed 

individually. Similar results are obtained for the third order harmonic transposer when connected to the fourth order 
harmonic transposer. 



The above-presented use of several interconnected time-domain transposes, for the creation of higher order 
20 harmonics, introduces substantial computational reduction. Furthermore, the proposed use of time-domain 

transposes in connection with a suitable filterbank, presents the opportunity to adjust the envelope of the created 
spectrum while maintaining die simplicity and low computational cost of a time domain transposer, since these, 
more or less, may be implemented using fixed point arithmetic and solely addiUve/subtractive-operations. 

25 Other, illustrative but not limiting, examples of the present invention are: 

- the use of a time domain transposer within each subband in a subband filter bank, thus reducing the signal 
complexity for each transposer. 

- the use of a time domain transposer in combination with a frequency domain transposer, thus enabling die 
system to use different methods for transposition depending on the characteristics of the input signal being 

30 processed. 

- the use of a time domain transposer in a wideband speech codec, operating on for instance the residual signal 
obtained after linear prediction. 

It should be recognised that the method ouUined above may be advantageously used for timescale modification only, 
by simply omitting the sample rate conversion. Further it is understood, that although the outlined method focuses 
35 on pitch transposing to a higher pitch, i.e. time expansion, the same principles apply when transposing to a lower 
pitch, i.e. time compression, as is obvious to those skilled in the art. 
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Filter bank base d transp osition 

Various new »,<, h™v al i ve f.„ er bank teed w . n ^ 

which ,s advamag.^ aceomphshed by . rf J 

The N-point STFT of a discrete-time signal x(n) is defined by 

*k (") = £*0>) h(n - p) e -w , 

p~~ao ' 

) f*(») = I 

U(») = 0 for n=±N,±2N,±M,... < 13 ) 
an inverse transform exists and is given by 

«">-£2>*w«>*v (14) 



*=0 



The chrect transform may be interpreted as an analyser, see Fig. 9a, cons.sOng of a bank of * HP-filters with impulse 
responses*^ 

have small bandwidths and are normally downsamp.ed 905. Bq. 12 need thus omy be evaluated atl ,* 
where R is the decimation factor and r is the new time variahl. y, * u evaluated at n rR, 

seeFi.rOhi. „■ f ar,s time vanable. ^)can be recovered fiomAtf,*) by upsampling 

see Ftg 9b, ,.e. insert™ of zeros 907 fo.lowed by LP-fi.tering 909. The inverse Worm may be interpreted s a 
^es« 

STFT and STFT may be rearranged in order to use the DFT and IDFT, which makes the use of FFT algorithms 

Fig. 9c shows a patch 915 for generation of second harmonics, M=2 with^=32 For.hnc v , ■ ,■ 
31 correspond * negauv. frequencies. ^ bkxk , ^ p ,„ < " J^T ' " 

i»P«t signal dela, pa(h. Analysis enamels * where 4 < t < , m „„ „ . „ corresponding lo an 

which shift ihe tlm „ , r * WMre,sts7 ^^»n n eeMiosynlhesisd 1 annelsA«,M.2 
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This yields 

y{n) = -Jf[ x (") * A(n)cos(<w t «)]cos((M - l)eo k n)) + 

2 r , < 15 > 

- — [x(n) * h(n)sii\(Q) k n)\sin((M - \)co k n) 

where M = 2. Eq. 15 may be interpreted as a BP-filtering of the input signal, followed by a linear frequency shift or 
Upper Side Band (USB) modulation, i.e. single side band modulation using die upper side band, see Fig. 10b, where 
5 1005 and 1007 form a Hilbert transformer, 1009 and 1011 are multipliers with cosine and sine carriers and 1013 is a 
difference stage which selects the upper sideband. Clearly, such a multiband BP and SSB method may be 
implemented explicitly, i.e. without filterbank patching, in the time or frequency domain, allowing arbitrary 
selection of individual passbands and oscillator frequencies. 

10 According to Eq. 15, a sinusoid with the frequency o% within the passband of analysis channel k yields a harmonic at 
the frequency Mcok^coi - a*). Hence the method, referred to as basic multiband transposition, only generates exact 
harmonics for input signals with frequencies <y, - a^, where 4 <> k <> 7. However, if the number of filters is 
sufficiently large, the deviation from an exact transposition is negligible, see Eq. 4. Further, the transposition is 
made exact for quasi-stationary tonal signals of arbitrary frequencies by inserting the blocks denoted P 917 (Fig. 9c), 

15 provided every analysis channel contains maximum one partial. In this caseX^r/?) are complex exponentials with 
frequencies equal to the differences between the partial frequencies ^ and the centre frequencies Wk of the analysis 
filters. To obtain the exact transposition A/, these frequencies must be increased by a factor A/, modifying the above 
frequency relationship to a* -* Mok+Mfa - a*) = Mm h The frequencies of X k (rR) are equal to the time derivatives 
of their respective unwrapped phase angles and may be estimated using first order differences of successive phase 

20 angles. The frequency estimates are multiplied by M and synthesis phase angles are calculated using those new 

frequencies. However, the same result, aside from a phase constant, is obtained in an simplified way by multiplying 
the analysis arguments by M directly, eliminating the need for frequency estimation. This is described in Fig. 1 1, 
representing the blocks 917. Thus X k (rR\ where 4 < k < 1 in this example, are converted from rectangular to polar 
coordinates, illustrated by the blocks R-»P, 1101. The arguments are multiplied by M = 2 1 103 and the magnitudes 

25 are unaltered. The signals are then converted back to rectangular coordinates (P -> R) 1 105 forming the signals 
Yha&rR) and fed to synthesiser channels according to Fig. 9c. This improved multiband transposition method thus 
has two stages; The patch provides a coarse transposition, as in the basic method, and the phase-multipliers provide 
fine frequency corrections. The above multiband transposition methods differ from traditional pitch shifting 
techniques using the STFT, where lookup-table oscillators are used for the synthesis or, when the ISTFT is used for 

30 the synthesis the signal is time-stretched and decimated, i.e. no patch is used. 

The harmonic patch of Fig. 9c is easily modified for other transposition factors than two. Fig. 12 shows a patch 1203 
for generation of 3 rd order harmonics, where 1201 are the analysis channels and 1205 are the synthesis channels. 
Different harmonic orders may be created simultaneously as shown in Fig. 13, where 2 nd and 3 rd order harmonics are 
3 5 used. Fig. 14 illustrates a non-overlapping combination of 2 nd , 3 rd and 4 th order harmonics. The lowest possible 
harmonic number is used as high in frequency as possible. Above the upper limit of the destination range of 
harmonic M, harmonic Af+1 is used. Fig. 15 demonstrates a method of mapping all synthesiser channels {N- 64, 
channels 0-32 shown). All highband channels with non prime-number indices are mapped according to the 
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™~ »» — and destinahcn chat*,, „ UInter: ^ , M _ ^ 

dial satisfies the condiuon that A™™ lies in the lowband and A*,, in the highband. Hence, no synthesiser channel 

connecuons with M= 2, 3, 4, 5 are shown in Fig. 15). 

It is also possible to combine amplitude and phase information f rom different analyser channel, The amplitude 
S «gnals ^)i may be connected according to Fig. ,6, whereas the phase signa.s „ connected 

according to the principle of Fig. 16. In this way the lowband frequencies will still be transposed, whereby a 
penodic repetition of the source region envelope is generated instead of the stretched envelope that results from a 
Uansposmon according to Eq. 2. Gating or other means may be incorporated in order to avoid amplification of 
empty source channels. Fig. ,7 .Uustrates another application, the generation of sub-harmonics to a highpass 
filtered or bass limited signal by using connections from higher to lower subbands. When using the above 
transpositions it may be beneficial to emp.oy adaptive switching of patch based on the characteristics of the signal. 

In the above description it was assumed that the highest frequency contained in the input signal was significant^ 

riT™ yqv r ew ^ 

aunple ra e. This ,s however not always the case, why a preceding upsamphng may be necessary. When using filter 
bank methods for transposition, it is possible to integrate upsamplmg in the process. 

Most perceptua. codecs employ maximally decimated fiiter banks in the tune to frequency mapping rintroducuon 
to Perceptual Coding" K. Brandenburg, AES, Collected Papers on Digital Audio Bitrate Reduction 1996] Fig 18a 
shows thebasic suture ofaperceptua. ^^Ti.^^^^^^^* 
several subband signal, The 

where the number of quantization levels are determined from a perceptual model 1 807 which estimates the ' 

- combinedwiths.de information consisting of to*-^^*^,^^" 

number of bUs 1811. A synthes, f.Uer bank combines the subband samples in order to recreate the original signal 
oil °^ 

followmgdescnpno., there isafocus on cosine 1^**^*^^^^ J^t 



Inth e iUusa.tive,butnotl^ 

splits the input signal x(n) into L subband signals The generic structiirp nf „, 



WO 98/57436 



15 



PCT/IB98/00893 



synthesis filters are denoted F k (z). In addition, the present invention performs a spectral replication on x(n) , giving 
an enhanced signal y(n), 

Synthesising the subband signals with a OL-channel filter bank, where only the L lowband channels are used and the 
5 bandwidth expansion factor Q is chosen so that QL is an integer value, will result in an output bit stream with 
sampling frequency Qf s . Hence, the extended filter bank will act as if it is an L-channel filter bank followed by an 
upsampler. Since, in this case, the L(Q-l) highband filters are unused (fed with zeros), the audio bandwidth will not 
change - the filter bank will merely reconstruct an upsampled version of x(n) . If, however, the L subband signals 
are patched to the highband filters, the bandwidth of x(n) will be increased by a factor Q, producing^). This is the 

10 maximally decimated filter bank version of the basic multiband transpose^ according to the invention. Using this 
scheme, the upsampling process is integrated in the synthesis filtering as explained earlier. It should be noted that 
any size of the synthesis filter bank may be used, resulting in different sample-rates of the output signal, and hence 
different bandwidth expansion factors. Performing spectral replication on x(n) according to the present invention of 
the basic multiband transposition method with an integer transposition factor M, is accomplished by patching the 

1 5 subband signals as 

vjw> (») = *m («) (-l) (M - ,) * rt v, («) , (16) 

where k e [0,L-1] and chosen so that Mk e [Z,,£?Z,-1], e M *(w) is the envelope correction and (-l)** 1 * is a correction 
factor for spectral inverted subbands. Spectral inversion results from decimation of subband signals, and the inverted 
signals may be reinverted by changing sign on every second sample in those channels. Referring to Fig. 20, consider 

20 an 16-channel synthesis filter bank, patched 2009 for a transposition factor M = 2, with Q = 2. The blocks 200 1 and 
2003 denote the analysis filters Hfc) and the decimators of Fig. 19 respectively. Similarly, 2005 and 2007 are the 
interpolators and synthesis filters Ffc). Eq. 16 then simplifies to patching of the four upper frequency subband 
signals of the received data into every second of the eight uppermost channels in the synthesis filter bank. Due to 
spectral inversion, every second patched subband signal must be frequency inverted before the synthesis. 

25 Additionally, the magnitudes of the patched signals must be adjusted 20 1 1 according to the principles of SBR-1 or 
SBR-2. 

Using the basic multiband transposition method according to the present invention, the generated harmonics are in 
general not exact multiples of the fundamentals. All frequencies but the lowest in every subband differs in some 

3 0 extent from an exact transposition. Further, the replicated spectrum contains zeros since the target interval covers a 
wider frequency range than the source interval. Moreover, the alias cancellation properties of the cosine modulated 
filter bank vanishes, since the subband signals are separated in frequency in the target interval. That is, neighbouring 
subband signals do not overlap in the high-band area. However, aliasing reduction methods, known by those skilled 
in the art, may be used to reduce this type of artifacts. Advantages of this transposition method are ease of 

3 5 implementation, and the very low computational cost. 

To achieve perfect transposition of sinusoids, an effective maximally decimated filler bank solution of the improved 
multiband transposition method is now presented. The system uses an additional modified analysis filter bank, while 
the synthesis filter bank is cosine modulated as described by Vaidyanathan ["Multirate Systems and Filter Banks" 
40 P. P. Vaidyanathan, Prentice Hall, Englewood Cliffs, New Jersey, 1993, ISBN 0-13-605718-7]. The steps for 
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nation, u^g 0* mnldhmd BampKMmmallM 

expansion factor £?. ^ * °* ^ " ^ * the ban « 



2. 
3. 



*(«) is downsampled by a factor fl ,o form signal * 2 (n') 2103, 2205, i. e x 2 („') = „ ^ 

^^^^m*^^^^^ r-1. Hence, the filter bank will 
1 5 be oversampled by a factor M. 

5. I*« m evened to a po,„ mamm ^ ^ pha 

2 1 09,221 5 .Afl e ,teop c «„^ Ulesisnalsa « ( „,, )arec|iUcal|ysa|ni|i(!(i » 

produces the signal x^n). us 
^ 8. * 3 <%) is finely added to *,(„) to give**) 2223) which is ^ ^ ^ ^ 

30 k y ,ectAas, °^ ^2 a positive .nteger. All subband signals s^\ n ") where,- 1? 

30 -^ftenumberoftransposidonfartors.areaddedacxoroingto w ^ , - 1, 2,..., w ,and 

'*(«■> -f,f'V) ^ 

for every applicable A:. In the first iteration of the Iood of Fi„ n a, ■ . 

samp.es of zeros only where^O 1 K , 7 ^ ^ *"» * «o be subband 

y, where* 0, l,...,KA. In every lo0 p, the new samples are added 2219 to sfiT) as 

^("•^WH^Vh (18) 
where k = /7gf I,..., rnin(K,7;)-l. The subband signals are cvn,h» 

bank according to step 7. * ) * Synthes,sed ° n <* with a K-channel filter 
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The modified analysis filter bank of step 4, is derived through the theory of cosine modulated filter banks, where the 
modulated lapped transform (MLT) ["Lapped Transforms for Efficient Transfonn/Subband Coding" H. S. Malvar, 
IEEE Trans ASSP, vol. 38, no. 6, 1990] is a special case. The impulse responses h k (n) of the filters in a ^-channel 
cosine modulated filter bank may be written 



h k (n) = C p 0 (n) cosjj^ (2k + 1)(« - ^ 



tf-1 



(19) 



where k = 0, 1,. . T-l, N is the length of the lowpass prototype filter p 0 (n\ C is a constant and <D* is a phase-angle 
that ensures alias cancellation between adjacent channels. The constraints on <X> A is 



71 



n 



and = <D 



4' J - 4 
which may be simplified to the closed form expression 

4 



Ar —1 



(20a-c) 



(21) 



With this choice of & k , perfect reconstruction systems or approximate reconstruction systems (pseudo QMF 
systems) may be obtained using synthesis filter banks with impulse responses as 



A(") = C/? 0 (w)cos 



n N-l 
— (2A + l)(w-iI_L)-<D. 
2T 2 k 



Consider the filters 



K {n) = Cp 0 (w) sm^~ {2k +l)(n - 



(22) 



(23) 



where h'^n) are sine-modulated versions of the prototype filter p 0 (n). The filters fr k (z) and H k (z) have identical 
passband supports, but the phase responses differ. The passbands of the filters are actually Hilbert transforms of 
each other (this is not valid for frequencies close to <o= 0 and a>= ti). Combining Eq. 19 and Eq. 23 according to 

tf-l 



**>)=M")+X(")==C/>o( 



)i)exp£ 



11 
2T 



(2*+l)(n- 



(24) 



yields filters that have the same shape of the magnitude responses as Hfc) for positive frequencies but are zero for 
negative frequencies. Using a filter bank with impulse responses as in Eq. 24 gives a set of subband signals that may 
be interpreted as the analytic (complex) signals corresponding to the subband signals obtained from a filter bank 
with impulse responses as in Eq. 19. Analytic signals are suitable for manipulation, since the complex-valued 
samples may be written in a polar form, that is z(ri) = r(n)+j i(n) = \z(n)\exp{j arg(z(n))}. However, when using the 
complex filter bank for transposition, the constraint on 0 k has to be generalised to retain the alias cancellation 
property. The new constraint on <P k , to ensure abas cancellation in combination with a synthesis filter bank with 
impulse responses as in Eq. 22 is 



(25) 



which simplifies to Eq. 21 when M = 1. With this choice, transposed partials will have the same relative phases as 
they would have when M = 1 (no transposition). 
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Combining Eq. 24 and Eq. 25 results in 



K («) = Cp 0 (n)cJj Jl?*±i> ( „ + tit 

[ [ 2T 2 4M 



(26) 



which are the filters used in the modified filter bank of step 4, according to the 



present invention. 



Wclanfic^ 

then, oversampled b yM, wh.cn is an essential cnterion when the phase-angles subsequently are mult.phed by the 
transpose ^ The oversa.pl^ 

tothefcrget range, to -equal that of the source range. The dividual band WI dths of the transposed subbandsigna.s 
are A/umes greater than those in the source range, due to the phase-multip.ier. This makes the subband signaT 



critically sampled after step 5, and additionally, there will be 
signals. 



no zeros in the spectrum when transposing tonal 



In order to avoid trigonometric calculous, that is, having to compute the new subband signals as 



(»') = «** 



exp< 



jM arctan 



imagJvj^V)}^ 



vfV) 



cos 



M arctan 



imagfv^V")} 
(^reaKvfV)} J 



(27) 



where Iv/V'), * ^ absolute value of ^ ^ ^ ^ 

co S (Ma) = cos^(a)-(^ )sin 2 (a)cos A/-2 (a) + ( M )sin 4 (a)cos A/-4 (a) _ ^ 

Letting 



a = arctan 



, real^V)} j 



and noting that 



cos(a) = cos(arctan 



unagfr^V)} 
{ rea^fV)} J 



)= reaJ0^V)} 



K M V)i 



and 



(29) 



(30) 



sin(«) = sintarctar/ ^^^^ iil _ ^-fV)} 
\ real{vf V)} | ~ 



(«■) 



(3D 
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When using transpositions where M is even, obstacles with the phase-multiplier may arise, depending on the 
characteristics of the lowpass prototype filter p 0 (n). All applicable prototype filters have zeros on the unit circle in 
the z-plane. A zero on the unit circle imposes a 180° shift in the phase response of the filter. For Meven, the phase- 
multiplier translates these shifts to 360° shifts; i.e. the phase-shifts vanish. The partials so located in frequency that 

5 such phase-shifts vanish will give rise to aliasing in the synthesised signal. The worst case scenario is when a partial 
is located at a point in frequency corresponding to the top of the first side lobe of an analysis filter. Depending on 
the rejection of tliis lobe in the magnitude response, the aliasing will be more or less audible. As an example, the 
first side lobe of the prototype filter used in the ISO/MPEG layer 1 and 2 standard is rejected 96 dB, while the 
rejection is only 23 dB for the first side lobe of the sine-window used in the MDCT scheme of the ISO/MPEG layer 

10 3 standard. It is clear, that this type of aliasing, using the sine-window, will be audible. A solution to this problem 
will be presented, and is referred to as relative phase locking. 

The filters h a ^n) all have linear phase responses. The phase-angles & k introduce relative phase differences between 
adjacent channels, and the zeros on the unit circle introduce 180° phase-shifts at locations in frequency that may 
15 differ between channels. By monitoring the phase-difference between neighbouring subband signals, before the 
phase-multiplier is activated, it is easy to detect the channels that contain phase-inverted information. Considering 
tonal signals, the phase-difference is approximately */2M, according to Eq. 25, for non-inverted signals; and 
consequently approximately n(\-\llM) for signals, where either of the signals is inverted. The detection of inverted 
signals may be accomplished simply by computing the dot product of samples in adjacent subbands as 

20 V)ovftV) = rc*l{v< M V))reaKO - (32) 

If the product in Eq. 32 is negative, the phase-difference is greater than 90°, and a phase-inversion condition is 
present. The phase-angles of the complex-valued subband signals are multiplied by M, according to the scheme of 
step 5, and finally, the inversion-tagged signals are negated. The relative phase locking method thus forces the 180° 
shifted subband signals to retain this shift after the phase-multiplication, and hence maintain the aliasing 
25 cancellation properties. 

Spectral envelope adjustment 

Most sounds, like speech and music, are characterised as products of slowly varying envelopes and rapidly varying 
carriers with constant amplitude, as described by Stockham ["The Application of Generalized Linearity to 
30 Automatic Gain Control" T.G. Stockham, Jr, IEEE Trans, on Audio and Electroacoustics, Vol. AU-16, No. 2, June 
1968] andEq. 1. 



In split-band perceptual audio coders, the audio signal is segmented into frames and split into multiple frequency 
bands using subband filters or a time-to-frequency domain transform. In most codec types, the signal is 

35 subsequently separated into two major signal components for transmission or storage, the spectral envelope 

representation and the normalised subband samples or coefficients. Throughout the following description, the term 
"subband samples" or "coefficients" refers to sample values obtained from subband filters as well as coefficients 
obtained from a rime-to-frequency transform. The term "spectral envelope" or "scale factors" represent values of the 
subbands on a time-frame basis, such as the average or maximum magnitude in each subband, used for 

40 normalisation of the subband samples. However, the spectral envelope may also be obtained using linear prediction 
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LPC, [U.S. Pat. 5,684,920]. In a typical codec, the normalised subband sables require coding at a high bitrate 
(using approximate* 90°/o of the available bitrate), compared to the slowly varying temporal envelopes, and thus the 
spectra, envelopes, that may be coded at a much-reduced .ate (using approximately 1 0% of the available bitrate). 

Accurate spectral envelope of the replicated bandwidth is important if the timbral qualities of the original signal are 
to be preserved. The perceived timbre of a musical instrument, or voice, is mainly determined by the spectral 
distnbutton below a frequency^ located in the highest octaves of hearing. The spectral details above^ are thus 
of less ,mportance, and consequently the highband fine structures obtained by the above transposition methods 
require no adjustment, while the coarse structures generally do. In order to enable such adjustment, it is useful to 
filter the spectral representation of the signal to separate the envelope coarse structure from the fine structure. 

In the SBR-1 implementation according to the present invention, the liighband coarse spectral envelope is estimated 
from the lowband information available at the decoder. This estimation is performed by continuously monitoring the 
envelope of the lowband and adjusting the highband spectral envelope according to specific rules. A novel method 
to accomphsh the envelope estimation uses asymptotes in a logarithmic frequency-magnitude space which is 
equwalent to curve fitting with polynomials of varying order in the linear space. The level and slope of an upper 
portion of the lowband spectrum are estimated, and the estimates are used to define the level and slope of one or 
several segments representing the new highband envelope. The asymptote intersections are fixed in frequency and 
act as P ,vot points. However not always necessary, it is beneficial to stipulate constraints to keep the highband 
envelope excursions within realistic boundaries. An alternative approach to estimation of the spectral envelope is to 
use vector quantization, VQ, of a large number of representative spectral envelopes, and store these in a lookup- 
table or codebook. Vector quantization is performed by training the desired number of vectors on a vast amount of 
trairung data, m this case audio spectral envelopes. The training is usually done with the Generalised Lloyd 
Algorithm ["Vector Quantization and Signal Compression" A. Gersho, R. M. Gray, Kluwer Academic Publishers 
USA 1992, ISBN 0-7923-9 181-0], and yields vectors that optimally cover the contents of the training data 
Considering a VQ codebook consisting of A spectral envelopes trained by B envelopes (B » A) then the A 
envelopes represent the A most likely transitions from the lowband envelope to the highband envelope based on B 
observations of a wide variety of sounds. This is, theoretically, the A optimum rules for predicting the envelope 
based on the B observations. When estunating a new highband spectral envelope, the original lowband envelope is 
used to search the codebook and the highband part of the best matching codebook entry is applied to create the new 
highband spectrum. 



In F>g. 23, the normalised subband samples are represented by 230 1 and the spectral envelopes are represented by 
the scalefactors 2305. For illustrative purposes the transmission to decoder 2303 ,s shown in parallel form In the 
SBR-2 method Fig. 24, me spiral envelope information is generated and transmitted according to Fig 23 whereby 
only the lowband subband samples are transmitted. Transmitted scalefactors thus span the fii.I frequency range 
while the subband samples only span a restricted frequency range, excluding the highband. At the decoder the 
lowband subband samples 2401 are transposed 2403 and combined with the received highband spectral envelope 
mformauon 2405. In mis way the synthetic highband spectral envelope is identica. to that of One original while 
mauitauung a significant bit rate reduction. ' 
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In some codecs, it is possible to transmit the scalefactors for the entire spectral envelope while omitting the 
highband subband samples, as shown in Fig. 24. Other codec standards stipulate that scalefactors and subband 
samples must cover the same frequency range, i.e. scale-factors cannot be transmitted if the subband samples are 
omitted. In such cases, there are several solutions; the highband spectral envelope information can be transmitted in 
separate frames, where the frames have their own headers and optional error protection, followed by the data. 
Regular decoders, not taking advantage of the present invention, will not recognise the headers and therefore discard 
the extra frames. In a second solution, the highband spectral envelope information is transmitted as auxiliary data 
within the encoded bitstream. However, the available auxiliary data field must be large enough to hold the envelope 
information. In cases where none of the first two solutions are adaptable, a third solution, where the highband 
spectral envelope information is hidden as subband samples, may be applied. Subband scalefactors cover a large 
dynamic range, typically exceeding 100 dB. It is thus possible to set an arbitrary number of subband scalefactors, 
2505 in Fig. 25, to very low values, and to transmit the highband scalefactors "camouflaged" as subband samples, 
2501. This way of transmitting the highband scale factors to the decoder 2503 ensures compatibility with the 
bitstream syntax. Hence, arbitrary data may be transmitted in this fashion. A related method exists where 
information is coded into the subband sample stream [U. S. Pat. 5,687,191). A fourth solution, Fig. 26, can be 
applied when a coding system uses Huffman- or other redundancy coding 2603. The subband samples for die 
highband is then set to zero 2601 or a constant value as to achieve a high redundancy. 

Transient response improvements 

20 Transient related artifacts are common problems in audio codecs, and.similar artifacts occur in the present invention. 
In general, patching generates spectral "zeros" or notches, corresponding to time domain pre- and post-echoes, i.e. 
spurious transients before and after "true" transients. Albeit the P-blocks "fill in die zeros" for slowly varying tonal 
signals, the pre- and post-echoes remain. The improved multiband method is intended to work on discrete sinusoids, 
where the number of sinusoids is restricted to one per subband. Transients or noise in a subband can be viewed as a 
25 large number of discrete sinusoids within that subband. This generates intermodulation distortion. These artifacts are 
considered as additional quantization-noise sources connected to the replicated highband channels during transient 
intervals. Traditional methods to avoid pre- and post-echo artifacts in perceptual audio coders, for example adaptive 
window switching, may hence be used to enhance the subjective quality of the improved multiband method. By 
using the transient detection provided by the codec or a separate detector and reducing the number of channels under 
30 transient conditions the "quantization noise" is forced not to exceed the time-dependent masking threshold. A 

smaller number of channels is used during transient passages whereas a larger is used during tonal passages. Such 
adaptive window switching is commonly used in codecs in order to trade frequency resolution for time resolution. 
Different methods may be used in applications where the interbank size is fixed. One approach is to shape the 
"quantization noise" in time via linear prediction in the spectral domain. The transposition is then performed on the 
residual signal, which is the output of the linear prediction filter. Subsequently, an inverse prediction filter is applied 
to the original- and spectral replicated channels simultaneously. Another approach employs a compander system i.e. 
dynamic amplitude compression of the transient signal prior to transposition or coding, and a complementary 
expansion after transposition. It is also possible to switch between transposition methods in a signal dependent 
manner, for example, a high resolution filterbank transposition method is used for stationary signals, and a time- 
40 variant pattern search prediction method is employed for transient signals. 
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Practical imp lementations 

Using a sUutdard signal-processor era powerful PC, realtime operation of a SBR-enhanced codec is possible The 

enh3nCed °° d " ^ 3lS0 bC h " d * odBd on a «— « «** » may also be implemented in various kinds of 
systems for storage or transmission of signals, analogue or digital, using arbitrary codecs, Fig. 27 and Fig 28 The 
SBR-1 method may be integrated in a decoder or supplied as an add-on hardware or software post-processing 
module, me SBR-2 method needs additional modification of the encoder. In Fig. 27 the analogue mput signal is fed 
to the ^-converter 270 1, forming a digital signa. which is fed to the an arbitrary encoder 2703, where source 
coding is performed. The signal fed into the system may be of such a low-pass type that spectral bands within the 
audnory range already have besn discarded, or spectra, bands are discarded m the arb.trary encoder. The resulting 
lowband signals are fed to the multiplexer 2705, forming a serial bitstream which is transmitted or stored 2707 The 
de-multiplexer 2709 restores the signals and feeds them to an arbitrary decoder 271 1. The spectra, envelope 
uuormauon 2715 is estimated at the decoder 2713 and fed to the SBR-1 unit 2713 which transposes the lowband 
s.gnal to a htghband signal and creates an envelope adjusted wideband signal. Final.y, the digital wideband signal is 
converted 27 17 to an analogue output signal. 

The SBR-2 method needs additional modification of the encoder. In Fig. 28 the analogue input signal is fed to the 
A/D-converter 280 1, forming a digital signal which is fed to the an arbitrary encoder 2803, where source coding ls 
performed. The spectral envelope information is extracted 2805. The resulting signals, lowband subband samples < 
coefficents and wideband envelope information, are fed to the multiplexer 2807, forming a serial bitstream which 
transmuted or stored 2809. The de-multiplexer 281 1 restores the signals, lowband subband samples or coefficients 
and w.deband envelope informaUon, and feeds them to an arbitrary decoder 2815. The spectra, envelope 
ntformauon 2813 is fed from the de-multip.exer 28, 1 to the SBR-2 unit 28,7 which transposes the .owband signa, 
to a mghband signal and creates an envelope adjusted wideband signal. Finally, the digital wideband signal is 
converted 28 19 to an analogue output signal. 

When only very ,ow nitrates are available, (Internet and slow telephone modems, AM-broadcasting etc ) mono 
codxng of the audio program material is unavoidable. In orderto improve the perceived quality and make the 
programme more pleasant sounding, a simp.e "pseudo-stereo" generator, Fig. 29, is obtained by the mtroduction of, 

chT ^ 

channel m add-on to the origmal mono s.gnal 2905. The pseudo-stereo generator offers a va.uab.e perceptual 
improvement at a low computational cost. 
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CLAIMS 



1. A method for enhancement of a source coding system where said source coding system comprises an encoder 
representing all operations performed prior to storage or transmission, and a decoder representing all operations 

5 performed after storage or transmission, characterised by: 

at said encoder, discarding frequency band(s) of an original signal, forming a first signal; 
at said decoder, by means of transposition performed on said first signal, replicating frequency band(s) of 
said original signal, forming a second signal; and 

combining said first and said second signal, forming an output signal; whereby reduced bitrate at a given 
10 perceptual quality or improved perceptual quality at a given bitrate is obtained. 

2, A method according to claim 1, characterised in that passband(s) of said second signal are set not to overlap or 
only partly overlap passband(s) of said first signal. 

15 3. A method according to claims 1 - 2, characterised in that spectral envelope adjustment is performed based on 
estimation of the spectral envelope of said discarded frequency band(s) of said original signal using said first signal. 

4. A method according to claims 1 - 2, characterised in that spectral envelope adjustment is performed based on 
transmitted envelope information of said discarded frequency band(s) of said original signal. 

20 

5. A method according to claim 4, characterised in that said spectral envelope information is transmitted as 
subband samples in an arbitrary number of subband channels where the gains of said subband channels are set to a 
low level; whereby compatibility with standardised decoders is ensured. 

25 6. A method according to claim 4, characterised in that said envelope information is transmitted as scale factors 
without transmission of the corresponding subband samples. 

7. A method according to claim 4, characterised in that said envelope information is transmitted as scale factors 
and the corresponding subband samples are set to zero or a constant value; whereby the entropy of the subband 

3 0 samples is reduced. 

8. A method according to claims 1 - 7, characterised in that said output signal, when monophonic audio, is split 
into two signals each comprised of said output signal and delayed versions of the same to obtain a pseudo-stereo 
signal. 

35 

9. A method according to claims 1 - 7, where said transposition is characterised by: 

filtering a signal through a set of Af > 2 bandpass filters with passbands comprising the frequencies 
respectively, forming N bandpass signals; 

shifting said bandpass signals in frequency to regions comprising the frequencies M [/i where M * 1 is 
40 the transposition factor; and 

forming a transposed signal by combining said shifted bandpass signals. 



WO 98/57436 



24 



PCT/IB98/00893 



10. A method according to claim 9, characterised in that said frequency shifting is obtained through upper side 
band (USB) modulation. 

11. A method for transposition by a factor A*, characterised by: 

bandpass filtering a signal using an analysis filter bank or transform of such a nature that real- or complex- 
valued subband signals of lowpass type are generated; 

patching an arbitrary number of channels k of said analysis filter bank or transform to channels Mk,M* 1, in 
a synthesis filter bank or transform; and 

forming a transposed signal using said synthesis filter bank or transform. 

12. A method according to claim 11, characterised in that said filter bank is maxunally decimated and said 
patching is performed according to the relation 

VAft(«) = (-D (A/ - ,)i "v jt («) > 

where (-1)^^" is a correction factor, v*<„) the subband signal of channel k, and v Mk{n) the subband signal of 
channel^:; whereby compensation of spectral inverted subband signals is obtained. 

13. A method according to claims H - 12, characterised by: 

patching the phases of the subband signals from channels * of said ana.ysis filter bank or transform as the 
phases of the subband signals associated with synthesis channels Mk,M* l;and 

patching the magnitudes of the subband signals from consecutive channels / of said analysis filter bank or 
transform as the magnitudes of the subband signals associated with consecutive synthesis channels l + S where 5 is 
an integer* 1. 

14. A method according to claims 11 - 13, characterised in that the phases of said subband Slg nals of said channels 
* are muluphed by said factorMbefore using said synthesis filter bank or transform. 



15. A method according to claims 11-14, characterised in that M = At" , where K i 



is an integer > 1. 



16. A method according to claims 11 - 15, characterised in that said patching employs multiple values of said 
transposition factor M. 
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17. A method for transposition by a factor M, characterised by: 

filtering a signal through a parallel bank of L filters with impulse responses as 



h k (n) = K />o(")exp 



where k - 0, 1,..., LA, K is a constant, and/> 0 (") is a lowpass prototype filter of length N> producing a set of I 
complex-valued signals; 

downsampling said L signals with a factor L/M, producing a set of L complex-valued subband signals; 
multiplying the phase-angles of said complex-valued subband signals by A/, giving a new set of subband 
signals; 

selecting the real part of said new set of subband signals, resulting in a set of L real-valued subband signals 
upsampling a subset of said real-valued subband signals with a factor L\ producing a set of real-valued 
signals; 

filtering said real-valued signals through a parallel bank of V filters with impulse responses as 



f k (n) = K'p' 0 (n)cos 



where k- 0, L'A, IC is a constant and/?' 0 (w) is a lowpass prototype filter of length W, forming a set ofL' 
filtered signals; and 

adding said V filtered signals to produce a transposed signal. 

18. A method according to claim 17, characterised in that said multiplication of said phase-angles and said 
selecting of the real part, is computed by: 

writing said complex-valued subband signals as 

where R k (n) and J^n) are the real and imaginary parts of Z k (n) respectively; 
calculating said real-valued subband signals W k (n) as 



(") = \Z k H cosj^T arctanf^M jj f 



where |Z*(n)| - sqniR^nf+I^n) 2 } and M is a positive integer transposition factor, using the trigonometric identity 
cos(A/a) = cos" (a) - ) sin 2 (a) cos"' 2 (a) + ) sin 4 (a) cos"" 4 (a) - ... , 
where a = arctan^w)//?^)}, and the relations 
CO s(a) = ^ T andsin(a)= h(n) • 



whereby reducing computational complexity by elimination of all trigonometric calculations. 
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19. A method according to claim 17, characterised by: 

on a block basis, exUacting information conveyed by the phase-difference of an adjacent pair of said 
complex-valued subband signals; 

perfornung arid multiplication of said phase-angles by M f orming a pair of said new subband 

shifts rrT? ^ 

shifts of the subband s.gnals are retained when employing an even in, ege , valued ^ 

20. A method according to claim 19, characterised in that said information 1S gIve „ by the dot . product 
complex-valued subband signals Z*ri) and Z k +,(„) according to 

Z k («) ° Z k+l („) = R k („)R k+i (w) + h {n) j M {n) ^ 

where^) and/,0) are the real and imaginary parts of Z,<») respecUveiy, , = *, wd one of said new ^ 
signals is negated provided said dot-product is negative. 

21 A method for transposition, by expanding or compressing a first signal in time and duplicating or discarding 
arb.trary long segments of said first signal, subsequently down- or up-sam P .e said first signa., characterised bv 

performing transient detection on said first signal; 

signal depending on the outcome of the transient detection; 

adjusting the length/, of said signal segment dependmg on the outcome of the transient detection- 

detection djUSUnS ^ ^ " ^ " ^ *" "* °" ** ° MC0 ™ ° f the ^ 

detectioT"" 8 ^ dClay ° bEtWeen SamP ' eS m ^ SU,teVeCt0r ° n ^ ° U,C ° me ° f * e 

detect^; 27 ^ ^ ° n «* ° UlC ° me ° f * e 

searching for synchronisation point, in chosen segment of said first s.gnal, based on the synchronisation 
potnt found in the previous synchronisation point search. 

22. A method according to claim 21, characterised io that several transpose, are unconnected m order to share 
^nchronrsatron point informaUon; whereby reduced computational complexity is achieved. 

bank, and the stgnals fed mto each of said transposes are filtered as to obtain an arbitrary spectral envelope of the 
newsrgnalbemgmesumofsaidsignalsbemgprocessedbysaid^^ a. envelope of the 



WO 98/57436 PCT/IB98/00893 

27 



24. A system for enhanced decoding of a source coded signal derived from an original signal, characterised by: 
transposition means for transposing frequency band(s) of said source coded signal, forming a first signal; 
estimation means operating on said source coded signal, for estimation of the spectral envelope of said 

original signal; 

adjusting means for adjusting the spectral envelope of said first signal, based on said estimation; 
combining means for combining said source coded signal and said adjusted first signal; whereby reduced 
bitrate at a given perceptual quality or improved perceptual quality at a given bitrate is achieved. 

25. An apparatus according to claim 24, operating when said output signal is monophonic audio characterised by: 
delaying means for delaying and attenuating means for attenuating said output signal forming a first delayed 

signal; 

delaying means for delaying and attenuating means for attenuating said output signal, using different 
parameters, forming a second delayed signal; 

means for adding said output and said first delayed signal, forming a left-channel output signal; and 
means for adding said output and said second delayed signal, forming a right-channel output signal; whereby 
obtaining a pseudo stereophonic signal. 

26. A system for enhanced source coding where said system comprises an encoder representing all units preceding a 
storage media or transmission channel, and a decoder representing all units following said storage media or 
transmission channel, characterised by: 

discarding means at said encoder for discarding frequency band(s) of an original signal, forming a first 

signal; 

extracting means at said encoder for extracting spectral envelope information of said original signal, forming 
a second signal; 

means at said encoder for coding of said first and second signals; 

transposition means at said decoder for transposing frequency band(s) of said first signal, forming a third 

signal; 

adjusting means at said decoder for spectral envelope adjustment of said third signal, based on said second 
signal; and 

combining means at said decoder for combining said first and said adjusted third signal; whereby reduced 
bitrate at a given perceptual quality or improved perceptual quality at a given bitrate is achieved. 

27. An apparatus according to claim 26, operating when said output signal is monophonic audio characterised by: 
delaying means for delaying and attenuating means for attenuating said output signal forming a first delayed 

35 signal; 

delaying means for delaying and attenuating means for attenuating said output signal, using different 
parameters, forming a second delayed signal; 

means for adding said output and said first delayed signal, forming a left-channel output signal; and 
means for adding said output and said second delayed signal, forming a right-channel output signal; whereby 
40 obtaining a pseudo stereophonic signal. 
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28. An apparatus for transposition a factory characterised by: 

valued subband signals of lowpass type are generated; 

means for patching an arbitrary number of channels * of said analysis filter bank or transform to channels 
Mk, M* 1, in a synthesis filter bank or transform; and 

forming a transposed signal by means of said synthesis filter bank or transform. 

29 AnapparatusaccordingtoclaimZS towIl(B ^. f 

ofthesubband signals of *^*~mW*U^^*^^„^J^ 

30. An apparatus for transposition by a factor M, characterised by: 

filtering means for filtering a signal through a parallel bank of L filters with impulse responses as 

h k (») = K P0 (n)exp j^-(2k + 1)(„ --^zi) + ,■ ( _ X) k 

L 21 2 4A/J' 

factor, producing a set of L complex-valued signals; 

signair 3115 ^ ^ L **» ™> * set of Z complex-vaiued subband 

subbandXr"^^ 

subbjszr eiecang " r " partofsaidnewsetofsubb ^ 

values;" 

filtering means for filtering said real-valued signals through a parallel bank of L filters with impulse 
responses as ^ 

fk («) = K 1 p' 0 („)cos JL (2k + 1)( „ _ W-l) _ £j ^ 

■Cs^aT^™'^ 1 " 1 ^^^^^ 

means for adding said V filtered signals Co produce the transposed sigrai. 
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31. An apparatus for transposition, by expanding or compressing a first signal in time by duplicating or discarding 
arbitrary long segments of said first signal, and subsequently down- or up-sample said first signal, characterised 
by: 

detection means for performing transient detection on said first signal; 
5 means for using the position of a possible transient when determining which segment of said first signal to be 

used when duplicating or discarding parts of said first signal, in order to obtain said transposition; 

adjusting means for adjusting the length (L) of said signal segment depending on the output from die 
transient detector; 

adjusting means for adjusting the number of samples (N) used for each statevector depending on the output 
10 from the transient detector; 

adjusting means for adjusting the delay (D) between samples in said statevector depending on the output 
from the transient detector; 

adjusting means for adjusting the number of samples \K) between each statevector depending on the output 
from the transient detector; and 
15 searching means for searching for synchronisation points in chosen segment of said first signal, based on the 

synchronisation point found in the previous synchronisation point search. 

32. An apparatus according to claim 31, operating on subband signals characterised by: 

means for sharing synchronisation information between multiple instances of said transposer; 
20 means for forming subsets of said subband signals; 

means for amplitude adjustment of every channel within each of said subsets; 

synthesis filter bank means for forming, from each of said subsets, an input signal to each instance of said 
transposers; 

processing said input signals by said transposers; and 
25 summation means for acquiring a new signal by summation of said processed signals; whereby an arbitrary 

spectral envelope can be obtained. 
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